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SOFTWARE  COMPLEXITY  RESEARCH  PROGRAM 


Department  of  Defense  (DOD)  software  production  and 
maintenance  is  a  large,  poorly  understood,  and  inefficient 
process.  Recently  Frost  and  Sullivan  (The  Military  Software 
Market,  1977)  estimated  the  yearly  cost  for  software  within 
DOD  to  be  as  large  as  $9  billion.  De  Roze  (1977)  has  also 
estimated  that  115  major  defense  systems  depend  on  software 
for  their  success.  In  an  effort  to  find  near-term  solutions 
to  software  related  problems,  the  DOD  has  begun  to  support 
research  into  the  software  production  process.  A  formal  5 
year  R&D  plan  (Carlson  &  DeRoze,  1977)  related  to  the 
management  and  control  of  computer  resources  was  recently 
written  in  response  to  DOD  Directive  5000.29(1976).  This 
plan  requested  research  leading  to  the  identification  and 
validation  of  metrics  for  software  quality. 

Interest  continues  to  grow  in  the  use  of  quantitative 
metrics  which  assess  the  complexity  of  software.  Such 
metrics  are  assumed  to  be  valuable  aids  in  determining  the 
quality  of  software.  Boehm,  Brown,  and  Lipow  (1976)  and 
McCall,  Richards,  and  Walters  (1977)  have  proposed 
combinations  of  such  metrics  which  assess  numerous  factors 
that  collectively  constitute  this  nebulous  "software 
quality".  Such  factors  include  reliability,  portability, 
maintainability,  and  myriad  other  xxx-abilities . 

There  are  numerous  potential  uses  for  measures  which 
assess  these  various  quality  factors.  First,  they  can  be 
used  as  feedback  to  programmers  during  development, 
indicating  potential  problems  with  code  they  have  developed 
(Elshoff ,  1978) .  Use  of  metrics  in  this  way  would  require 
guidelines  for  altering  code  so  as  to  bring  different 
metric  values  within  acceptable  limits. 

A  second  use  for  metrics  is  in  guiding  software  testing. 
McCabe  (1976)  proposed  the  cyclomatic  number  as  a  means  of 
assessing  the  computational  complexity  of  the  software 
testing  problem.  Other  metrics  which  index  the  quality  or 
complexity  of  software  may  help  identify  modules  or 
subroutines  which  are  likely  to  be  the  most  error-prone. 

Another  use  for  software  metrics  is  their  use  in 
estimating  maintenance  requirements.  If  one  or  more  metrics 
can  be  empirically  related  to  the  difficulty  programmers 
experience  in  working  with  software,  then  more  accurate 
estimates  can  be  made  of  the  staffing  levels  that  will  be 
necessary  during  maintenance.  Empirical  validity  studies 
will  be  necessary  before  employing  metrics  for  any  of  the 
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three  uses  described  here.  Such  research  should  be  conducted 
with  professional  programmers . 

The  experimental  investigation  described  in  this  report 
is  part  of  a  research  program  seeking  to  provide  information 
about  the  psychological  and  human  resource  aspects  of 
computer  programming.  The  challenge  undertaken  in  this 
research  was  to  quantify  the  psychological  complexity  of 
software.  It  is  important  to  distinguish  clearly  between  the 
psychological  and  computational  complexity  of  software. 
Computational  complexity  refers  to  characteristics  of 
algorithms  or  programs  which  make  their  proof  of  correctness 
difficult,  lengthy,  or  impossible.  For  example,  as  the 
number  of  distinct  paths  through  a  program  increases,  the 
computational  complexity  also  increases.  Psychological 
complexity  refers  to  those  characteristics  of  software  which 
make  human  understanding  of  software  more  difficult.  No 
direct  linear  relationship  between  computational  and 
psychological  complexity  is  expected.  A  program  with  many 
control  paths  may  not  be  psychologically  complex.  Any 
regularity  to  the  branching  process  within  a  program  may  be 
used  by  a  programmer  to  simplify  understanding  of  the 
program . 

Halstead  (1977)  has  recently  developed  a  theory 
concerned  with  the  psychological  aspects  of  computer 
programming.  His  theory  provides  objective  estimates  of  the 
effort  and  time  required  to  generate  a  program,  the  effort 
required  to  understand  a  program,  and  the  number  of  bugs  in  a 
particular  program  (Fitzsimmons  &  Love,  1978) .  Some 
predictions  of  the  theory  are  counterintuitive  and  contradict 
results  of  previous  psychological  research.  The  theory  has 
attracted  attention  because  independent  tests  of  hypotheses 
derived  from  it  have  proven  amazingly  accurate. 

Although  predictions  of  programmer  behavior  have  been 
particularly  impressive,  much  of  the  research  testing 
Halstead's  theory  has  been  performed  without  sufficient 
experimental  or  statistical  controls.  Further,  much  of  the 
data  were  based  upon  imprecise  estimating  techniques. 
Nevertheless,  the  available  evidence  has  been  sufficient  to 
justify  a  rigorous  evaluation  of  the  theory. 

Rather  than  conduct  a  research  program  designed 
specifically  to  test  Halstead's  theory  of  software  science,  a 
research  strategy  was  chosen  which  would  generate  suggestions 
for  improving  programmer  efficiency  regardless  of  the  success 
of  any  particular  theory.  This  research  focused  on  four 
phases  of  the  software  life-cycles  understanding, 
modification,  debugging,  and  construction.  Since  different 
cognitive  processes  are  assumed  to  predominate  in  each  phase. 
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no  single  experiment  or  set  of  experiments  on  a  particular 
phase  were  believed  to  provide  a  sufficient  basis  for  making 
broad  recommendations  for  improving  programmer  efficiency. 
Each  experiment  in  this  research  program  was  designed  to  test 
important  variables  assumed  to  affect  a  particular  phase  of 
software  development.  Professional  programmers  were  used  in 
these  experiments  to  provide  the  greatest  possible  external 
validity  for  the  results  (Campbell  &  Stanley,  1966).  In 
addition,  the  theory  of  software  science  and  other  related 
metrics  were  evaluated  with  these  data.  This  experiment,  the 
fourth  in  the  series,  concentrated  on  the  construction  phase 
of  software  development. 


ABSTRACT 


An  experiment  was  conducted  to  assess  the  utility  of 
complexity  metrics  for  the  prediction  of  programmer 
performance  in  the  construction  of  software.  After 
practicing  on  a  preliminary  program,  each  of  the  nine 
participants  developed  three  experimental  programs  on-line. 

An  English  language  description  of  each  problem  was  presented 
in  addition  to  one  of  the  following  specification  formats: 

1)  program  design  language,  2)  tree  chart,  and  3)  both  of 
these  techniques.  No  significant  differences  were  found  in 
the  times  to  construct  programs  from  these  different  types 
of  specification  formats.  The  software  complexity  metrics 
developed  by  Halstead  and  McCabe  were  found  to  be 
significantly  better  predictors  of  the  time  to  complete  the 
program  than  the  number  of  statements. 
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INTRODUCTION 


The  impact  of  the  cost  of  software  has  become 
increasingly  apparent  over  the  last  decade.  In  the  late 
1950's  most  of  the  cost  of  computer  systems  was  for  hardware, 
but  now  90%  of  the  costs  are  for  software  (Shneiderman, 

1980).  The  design  and  construction  process  has  great  impact 
on  the  subsequent  operations  and  maintenance  portions  of  a 
software  system  which  account  for  most  life  cycle  costs 
(Boehm  1973).  Several  experiments  evaluating  program 
construction  have  been  performed  (Boies  &  Gould,  1974; 
Dunsmore  &  Gannon,  1978;  Love,  1977;  Lucas  &  Kaplan,  1974; 
Miller  &  Thomas,  1976;  Newsted,  1974;  Shneiderman  &  Mayer, 
1979;  Shneiderman,  Mayer,  McKay,  &  Heller,  1977;  Sime, 
Arblaster,  &  Green,  1977).  One  problem  with  these  studies 
has  been  the  inability  to  examine  individual  programmer 
strategies.  Love  (1977)  and  Dunsmore  &  Gannon  (1978)  were 
forced  to  deduce  strategies  from  subjective  examination  of 
successive  runs  of  programs,  as  were  Sackman,  Erickson,  & 
Grant  (1968),  Sime  et  al.  (1977)  &  Youngs  (1974).  Myers 
(1978)  was  only  able  to  look  at  the  final  product. 

In  an  attempt  to  collect  more  objective  data  about  the 
programming  process,  the  Software  Management  Research  Unit  at 
General  Electric  has  established  a  software  research 
laboratory.  The  microcomputer  at  the  core  of  this  lab  keeps 
an  audit  trail  of  all  the  actions  of  a  user  during  the 
construction,  editing  and  debugging  of  a  program.  We  can 
examine  the  time  spent  on  various  portions  of  tasks  and 
actual  changes  and  additions  made,  rather  than  relying  on 
assumptions  about  programmer  behavior.  A  description  of  the 
laboratory  can  be  found  in  Appendix  F. 

The  present  experiment  had  three  main  purposes:  1)  to 
demonstrate  that  the  laboratory  is  a  feasible  data  collection 
tool,  2)  to  examine  the  impact  of  two  design  specification 
formats,  and  3)  to  assess  various  metrics  of  program 
complexity  for  predicting  programming  effort. 

There  are  two  primary  approaches  to  design  specification 
formats  in  current  software  projects:  verbal  and  graphical 
descriptions  (Jones,  1979).  A  number  of  studies  in  cognitive 
psychology  have  indicated  that  verbal  descriptions  are 
sequential  in  nature,  emphasizing  ordering  relationships 
(Kintsch  &  van  Dijk,  1978;  Paivio,  1971).  Wright  and  Reid 
(1973)  have  demonstrated  that  verbal  descriptions  are 
retained  better  over  a  period  of  days  than  those  presented 
graphically. 
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The  alternate  approach  to  verbal  specification  of  design 
is  a  graphic  representation  of  the  program.  This  method  is 
well  suited  to  stepwise  refinement  (Wirth,  1971)  and  to  some 
theories  of  the  organization  of  human  memory  (Collins  and 
Ouillian,  1969;  Kintsch  &  van  Dijk,  1978;  Paivio,  1971; 
Ramsey,  Atwood,  &  Van  Doren,  1978).  Although  graph-oriented 
documentation  is  widely  used  throughout  industry, 
flowcharting  is  one  graphical  method  that  has  been  shown  to 
be  of  questionable  utility  (Newsted,  1979;  Ramsey,  Atwood,  & 
Van  Doren,  1978;  Shneiderman,  Mayer,  McKay  &  Heller,  1977). 

To  compare  the  two  forms  of  design,  the  two 
representations  of  a  program  must  have  the  same  information 
content.  A  number  of  authors  have  demonstrated  mappings  from 
trees  or  hierarchical  structures  to  sequential  constrained 
language  such  as  a  Program  Design  Language  (Jackson,  1975; 
McClure,  1978;  Stay,  1976).  A  tree  allows  a  functional 
representation  of  a  program,  but  it  is  difficult  to  represent 
the  ordering  and  selection  criteria.  Jackson  approaches  this 
goal  in  his  data  trees  with  symbols  for  selection  and 
iteration.  In  the  present  experiment  a  program  is  represented 
as  a  tree  or  a  hierarchy  of  functions,  with  lower  levels 
representing  more  detail.  If  there  is  some  selection 
criteria,  the  edges  of  the  graph  are  labeled  with  the 
criteria. 

A  Program  Design  Language  (PDL)  was  chosen  as  the  verbal 
sequential  representation  in  this  experiment.  The  PDL  was 
chosen  because  it  was  possible  to  map  precisely  from  the  tree 
structure  to  the  PDL. 

A  program  specification  represented  in  a  tree  structure 
must  be  translated  into  a  sequential,  program-like  form. 

Since  this  format  adds  an  additional  translation  step  into 
the  construction  process,  it  can  be  argued  that  it  is  best  to 
specify  the  detailed  design  with  a  verbal  description 
originally.  This  study  attempted  to  determine  whether  the 
specification  format  significantly  influenced  the 
construction  process. 

Previous  experiments  in  this  program  of  research 
(Curtis,  Sheppard,  &  Milliman,  1979;  Curtis,  Sheppard, 
Milliman,  Borst,  &  Love,  1979)  have  shown  that  the 
psychological  complexity  of  a  computer  program  can  be 
quantified  by  using  the  Halstead  (1977)  and  McCabe  (1976) 
complexity  metrics.  These  experiments  were  concerned  with 
performance  during  comprehension,  modification,  and  debugging 
tasks.  Because  the  metrics  are  also  relevant  to  the 
construction  process,  their  ability  to  predict  programming 
time  was  evaluated  during  this  experiment. 
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Halstead's  theory  of  software  science  argues  that 
algorithms  have  measurable  characteristics  and  that  a  number 
of  useful  measures  can  be  derived  from  simple  counts  of 
operators  and  operands.  From  these  quantities  Halstead 
(1977)  developed  measures  for  the  overall  program  length, 
potential  smallest  volume  of  an  algorithm,  actual  volume  of 
an  algorithm  (the  difficulty  of  understanding  a  program), 
language  level  (a  constant  for  a  given  language),  programming 
effort  (number  of  mental  discriminations  required  to  generate 
a  program) ,  program  development  time,  and  number  of  delivered 
bugs  in  a  system.  Halstead's  theory  has  been  the  subject  of 
considerable  evaluative  research  (Fitzsimmons  &  Love,  1978). 
Correlations  often  greater  than  .90  have  been  reported 
between  the  Halstead  metrics  and  such  measures  as  the  number 
of  bugs  in  a  program,  programming  time,  debugging  time,  and 
algorithm  purity. 

Thomas  McCabe  (1976)  has  defined  complexity  in  relation 
to  the  decision  structure  of  a  program.  He  assesses 
complexity  as  it  affects  the  testability  and  reliability  of  a 
module.  McCabe's  complexity  metric,  v(G) .  is  the  classical 
graph-theory  cyclomatic  number  indicating  the  number  of 
regions  in  a  graph,  or  in  the  current  usage,  the  number  of 
linearly  independent  control  paths  comprising  a  program.  As 
is  true  of  Halstead's  measures,  McCabe's  metric  has  been 
shown  to  be  a  better  predictor  of  performance  than  a  simple 
count  of  lines  of  code  (Curtis,  Sheppard,  &  Milliman, 

1979).  The  present  experiment  sought  to  confirm  these 
predictions  in  a  program  construction  task. 


METHOD 


Participants 

Nine  professional  programmers  participated  in  this 
experiment.  They  averaged  4.7  years  of  programming 
experience,  ranging  from  less  than  1  to  12  years  (.SD  *  4.1). 

Procedure 


A  packet  of  materials  prepared  for  each  participant 
included:  written  instructions  on  the  experimental  procedure 
(Appendix  A),  instructions  for  using  the  operating  system  and 
the  Fortran  compiler  (Appendix  B),  a  short  preliminary  task 
(Appendix  C),  and  three  experimental  tasks  (Appendix  D). 

Since  the  material  in  Appendix  B  was  rather  long,  it  was 
presented  to  the  participants  the  day  before  the  experiment 
so  they  could  become  familiar  with  the  instructions  prior  to 
the  experimental  session. 

A  session  was  conducted  with  an  individual  programmer  at 
the  CRT  terminal  of  a  microcomputer.  In  addition  to  the 
written  materials,  the  participants  were  given  some  prompting 
on-line  in  the  form  of  instructions  such  as  "Please  turn  to 
the  first  problem".  Following  an  initial  practice  problem, 
participants  were  presented  with  three  separate  programs 
comprising  their  experimental  tasks.  A  questionnaire  was 
presented  on-line  after  the  experimental  tasks  were  completed 
(Appendix  E).  The  experimenter  was  present  at  all  times  and 
ready  to  act  as  a  reference  source  for  explaining  how  to  use 
the  computer,  the  editor,  or  the  Fortran  compiler. 

Participants  worked  at  their  own  speed,  signaling  the 
instructor  when  prepared  to  execute  a  compilation.  Due  to 
idiosyncracies  in  the  automated  data  collection  system,  all 
programs  were  compiled  and  run  by  the  experimenter.  A 
program  that  was  not  correct  at  the  first  submission  was 
returned  to  the  participant,  and  repeated  submissions  were 
executed  until  the  program  had  been  run  successfully. 
Successful  completion  required  producing  the  correct  output 
from  an  input  data  file  that  was  hidden  from  the  participant. 

A  detailed  record  of  each  response  by  a  participant  was 
recorded  automatically  by  the  data  collection  system  of  the 
microcomputer.  An  internal  timer  accurate  to  one-hundredth  of 
a  second  recorded  the  time  for  each  of  these  responses. 

Independent  Variables 

Programs.  Four  short  algorithms  were  selected  for  the 
general  understandability  of  their  content.  The  practice 
problem  required  computing  the  average  of  a  list  of  numbers. 


One  experimental  program  required  the  alphabetic  matching  of 
strings  of  characters.  Another  required  summing  the  positive 
and  negative  values  of  a  set  of  numbers.  The  last  algorithm 
found  the  maximum  and  minimum  of  a  set  of  numbers.  Because 
participants  were  expected  to  complete  a  practice  problem  and 
three  experimental  tasks  within  three  hours,  the  algorithms 
were  necessarily  short  and  uncomplicated. 

Documentation .  The  practice  problem  included  a 
functional  specification  in  natural  language,  sample  input, 
and  sample  output.  No  additional  documentation  was  given  in 
the  practice  problem.  The  experimental  tasks  included  one  of 
three  types  of  additional  documentation:  a  PDL  description  of 
the  program  function,  a  tree  structure  showing  the  function, 
or  both  the  PDL  and  tree  structures. 

The  purpose  of  the  additional  documentation  was  to 
present  the  functional  decomposition  of  the  algorithm  in 
detail.  The  PDL  specification  was  a  sequential,  constrained 
lanquage  description  of  the  functions  to  be  performed  in  the 
program.  Indentation  was  used  to  specify  the  more  detailed 
levels  of  the  process,  thus  making  the  description  partly 
hierarchical  in  nature.  The  tree  representation  was  designed 
for  ordinary  preorder  traversal  (Knuth,  1973).  That  is,  the 
vertical  dimension  indicated  levels  of  abstraction  of  the 
functions  to  be  performed,  with  the  most  detailed  levels 
occurring  lowest  in  the  tree.  The  horizontal  dimension 
indicated  order  of  progression  from  left  to  right  with  each 
"father"  node  being  processed  first.  The  leftmost  "son" 
followed  the  "father",  and  the  progession  followed  to  the 
right  until  all  "sons"  had  been  processed.  Branching  and 
iteration  were  indicated  by  labelling  the  condition  for 
execution  on  the  edge  leading  to  the  node.  Where  no  label 
appeared,  the  node  was  always  executed. 


Experimental  Design 

A  within-subjects  3^  factorial  design  was  employed. 

Each  of  the  three  programs  was  prepared  with  the  three  types 
of  documentation:  the  PDL,  the  tree  structure  or  both  the  PDL 
and  tree  structures.  A  matrix  of  these  nine  experimental 
conditions  is  shown  in  Figure  1.  Previous  experiments 
employing  the  design  had  shown  learning  effects  (Sheppard, 
Curtis,  Milliman,  &  Love,  1979) .  Therefore,  the  order  of 
presentation  of  the  tasks  was  counterbalanced  so  that  each 
program  appeared  as  the  first,  second,  or  third  task  equally 
often.  Each  type  of  documentation  was  similarly 
counterbalanced  according  to  order  of  presentation. 


Individual  Difference  Measures 

Scores  on  the  practice  problem  were  used  as  a  measure  of 
programming  ability  related  to  the  experimental  task. 
Participants  were  also  asked  to  complete  a  questionnaire 
about  their  programming  experience. 


Dependent  Variables 

The  major  dependent  variable  was  the  total  time  required 
to  successfully  accomplish  a  task.  An  internal  timer  accurate 
to  one  one-hundredth  of  a  second  recorded  each  of  the  actions 
of  the  participant  at  the  microprocessor.  The  total  time 
was  computed  by  summing  all  of  these  intervals.  It  did  not 
include  time  for  compilation  and  execution  of  the  program, 
tasks  performed  by  the  experimenter. 

A  second  dependent  variable,  the  study  time,  was  the 
time  between  the  presentation  of  the  problem  to  the 
participant  and  the  initial  entry  of  an  instruction  into  the 
computer  system  (i.e.,  the  first  ADD  command).  This  variable 
measured  the  planning  or  thinking  time  of  the  participant. 

The  study  time  was  included  in  the  total  time. 


Complexity  Metrics 

Halstead's  V  and  E.  Halstead's  volume  (V)  and  effort 
(E)  metrics  were  computed  precisely  from  a  program  (based  on 
Ottenstein,  1976)  whose  input  was  the  source  code  listings  of 
the  27  programs,  three  from  each  of  the  nine  participants. 
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The  computational  formulas  are: 


r>  ^ 


where. 


V 


E 


>  (nx  +  n2)  iog2  (n1  +  n2) 

nl  N2  <N1  +  N25  1o82  (nl  +  n2^ 


nl  •  #  of  unique  operators 

n2  -  #  of  unique  operands 

»  total  #  of  all  operators 

N2  ■  total  #  of  all  operands 

McCabe's  v(G) .  McCabe's  metric  is  the  classical  graph- 
theory  cyclomatic  number  defined  as: 

v(G)  =»  #  edges  -  #  nodes  +  2(#  connected  components). 


McCabe  presents  two  simpler  methods  of  calculating  v(G): 
the  number  of  predicate  nodes  plus  1,  or  the  number  of 
regions  computed  from  a  planar  graph  of  the  control  flow. 

Number  of  statements.  The  length  of  the  program  was  the 
total  number  of  statements  added  minus  the  number  of 
statements  deleted. 


RESULTS 


All  27  programs  (nine  participants  with  three  programs 
each)  were  completed  successfully.  The  number  of  tries  to 
get  a  clean  compilation  ranged  from  1  to  3  (M  ■  1.3),  and  the 
total  tries  to  run  successfully  ranged  from  1  to  4  (Jj  » 

1.7).  The  average  total  time  to  complete  a  program  was  21.3 
minutes  (SC  ■  14.6),  and  the  average  study  time  was  3.3 
minutes  (SC  =*  4.2).  There  were  significant  differences  in 
the  times  required  to  construct  each  of  the  three  programs 
(j?  <_  .001),  accounting  for  over  50%  of  the  variance  in 
performance.  Program  1  required  an  average  of  35.9  minutes 
to  complete,  while  Programs  2  and  3  averaged  14.4  and  13.6 
minutes,  respectively. 

Four  metrics  of  program  complexity  were  computed  for 
each  of  the  programs  constructed  by  the  participants: 
Halstead's  and  j£,  McCabe's  v(G ) .  and  the  number  of  program 
statements.  Descriptive  statistics  for  the  metrics  on  each 
of  the  three  programs  are  presented  in  Table  1 .  The  mean 
values  for  Programs  2  (summing  positive  and  negative  numbers) 
and  3  (finding  maximum  and  minimum  values)  were  similar.  The 
means  for  Program  1  (alphabetic  matching)  were  larger,  and 
the  standard  deviations  for  the  Halstead  and  McCabe  metrics 
were  much  greater.  The  range  of  individual  differences  in 
the  implementation  of  these  programs  was  striking,  and  these 
differences  were  most  prominent  on  Halstead's  effort  metric. 

Table  2  shows  the  intercorrelations  among  the  software 
complexity  metrics.  As  expected,  V  and  E  were  highly 
correlated.  Number  of  statements  was  not  as  well  correlated 
with  other  metrics. 

Pearson  Product-Moment  correlations  between  the  metrics 
and  both  the  total  time  and  study  time  are  reported  in  Table 
3.  Study  times  were  unrelated  to  the  metrics.  However,  over 
all  27  programs  the  Halstead  and  McCabe  metrics  were  better 
predictors  of  total  time  when  compared  to  the  number  of 
statements  in  the  programs.  The  volume  metric  (r  ■  .78)  was  a 
significantly  (p  £  .01)  better  predictor  of  total  time  than 
was  the  effort  metric  (r  «  .61). 


Table  1 


Descriptive  Statistics  for  Metrics  by  Program 


Metric 

Mean 

Standard 

deviation 

Rang 

Minimum 

e 

Maximum 

Max/Min 

Program  1 : 

Statements 

26.6 

10.2 

17 

51 

3.0 

Halstead's  V 

417.2 

170.3 

223 

747 

3.3 

Halstead's  E 

8816.1 

8165.4 

1951 

27655 

14.2 

McCabe ' s  v(G) 

14.4 

9.3 

6 

34 

5.7 

Program  2 : 

Statements 

20.7 

7.3 

12 

33 

2.8 

Halstead's  V 

215.6 

46.1 

156 

299 

1.9 

Halstead's  E 

2523.5 

118.5 

1768 

5199 

2.9 

McCabe ' s  v(G) 

4.2 

1.0 

3 

6 

2.0 

Program  3 : 

Statements 

19.1 

10.9 

12 

46 

3.8 

Halstead's  V 

188.4 

.87.5 

126 

404 

3.2 

Halstead's  E 

2311.2 

2766.0 

908 

9538 

10.5 

McCabe's  v(G) 

4.9 

0.6 

4 

6 

1.5 
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Table  2 


Intercorrelations  Among  Complexity  Metrics 


Number  of  Halstead' s _ 

Statements  Volume  Effort 


Halstead*  s  V 

.59*** 

Halstead's  E 

.65*** 

.95*** 

McCabe ' s  v(G) 

.52** 

.81*** 

.85*** 

Note:  n  »  27 
**£  <  .01 
***2  <_  .001 


When  the  data  were  separated  by  program,  strong 
differences  emerged  in  the  ability  of  the  metrics  to  predict 
performance  time.  For  Programs  2  and  3,  both  Halstead 
metrics  were  exceptionally  strong  predictors  of  time.  No 
such  relationship  could  be  found  for  Program  1.  Regressions 
for  each  metric  on  total  time  are  presented  in  Figures  2 
through  5  with  scores  for  Program  1  circled  in  each  case. 
Times  for  Program  1  were  generally  longer  and  showed  more 
variability  than  those  for  Programs  2  and  3 .  The 
scatterplots  in  Figure  3  through  5  appear  to  support  a 
curvilinear  relationship.  However,  this  appearance  is  almost 
entirely  the  result  of  two  datapoints  from  Program  1 . 

Halstead  (1977)  presents  a  way  of  estimating  the  time 
required  to  generate  a  program.  He  develops  this  estimator 
by  dividing  the  effort  metric  (which  is  presented  as  the 
number  of  mental  discriminations  required  to  generate  the 
program)  by  the  Stroud  (1967)  number  of  18  mental 
discriminations  per  second.  It  is  evident  from  Figure  4  that 
this  measure  consistently  underestimates  the  actual  amount 
of  time  required  to  develop  the  program. 

No  significant  effects  on  performance  were  observed 
among  the  three  methods  of  documentation  that  were  presented 
with  the  programs. 

The  average  pretest  time  was  30.9  minutes  with  an 
average  of  3.1  tries  to  complete  the  program  successfully. 
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Table  3 


Correlations  Between  Performance  Time 
and  Software  Complexity  Metrics 


Correlations 


Measures 


Number  of  Halstead 1 s  McCabe's 

statements  Volume  Effort  v(G) 


All  Programs: 


Study  Time 

.30 

.29 

.21 

.04 

Total  Time 

.35* 

.78*** 

.61*** 

.66*** 

Program  1: 

Study  Time 

.13 

.02 

-.01 

-.37 

Total  Time 

.15 

.42 

.28 

.38 

Programs  2  &  3: 

Study  Time 

.39 

.04 

.08 

.07 

Total  Time 

.20 

.85*** 

.83*** 

.59** 

Note:  For  all  programs,  n  ■  27;  for  Program  1,  n  -  9; 

for  Programs  2  &  3,  jn  ■  18. 

*£  <  .05 
**£  1  .01 
***£  <_  .001 
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Figure  2.  ! 


TOTAL  TINE  (SECONDS) 


TIME  (SECONDS) 


Total  times  during  the  three  experimental  tasks  did  not  vary 
as  a  function  of  presentation  order.  Thus,  it  appears  that 
the  pretest  provided  sufficient  experience  to  eliminate 
learning  effects  during  the  experimental  tasks.  Pretest  time 
was  a  moderate  predictor  of  time  for  the  experimental  tasks 
(_r  *  .40,  £  <.  .05),  suggesting  the  influence  of  individual 
differences  among  participants. 

Analyses  of  the  programs  submitted  by  the  participants 
yielded  counts  of  the  total  number  of  actions  to  add,  delete, 
change  or  list  a  program.  (The  renumber  command  was  not  used 
by  the  participants).  The  number  of  calls  to  the  editor  for 
the  "ADD",  "CHANGE",  "DELETE",  and  "LIST"  functions  were  also 
tallied.  The  number  of  actions  was  greater  than  or  equal  to 
the  number  of  calls  in  each  session,  since  more  than  one 
statement  could  be  added,  deleted,  changed,  or  listed  during 
a  call  to  a  given  editor  command.  For  example,  if  a 
programmer  added  five  statements  at  one  time,  it  would 
increase  the  number  of  actions  by  five  but  would  increase 
calls  to  the  editor  for  "ADD"  by  only  one. 

Descriptive  statistics  for  editor  actions  are  presented 
by  run  in  Table  4.  Only  about  half  of  the  participants 
required  more  than  one  run  to  successfully  complete  their 
program.  Very  few  participants  required  more  than  two  runs. 
After  the  first  run,  the  number  of  lines  added  decreased 
substantially,  while  the  number  of  lines  changed  or  deleted 
remained  fairly  constant. 

Both  the  number  of  actions  taken  and  the  number  of  calls 
were  analyzed  for  each  compilation.  Actions  and  calls 
performed  during  the  first  run  (original  entry  of  the 
program)  were  significant  predictors  of  the  total  time  to 
complete  the  experimental  task  (Table  5).  That  is,  those 
programmers  who  manipulated  the  program  listing  more 
frequently  (i.e.,  added,  deleted,  changed  or  listed)  prior  to 
the  first  submission  of  the  program  had  longer  total 
completion  times.  Actions  taken  during  the  second,  third, 
and  fourth  submissions  were  few  in  number  and  not  well 
correlated  with  performance. 

Table  6  shows  the  correlations  between  the  various 
actions  and  the  Halstead  metrics  for  each  program.  No 
differences  were  observed  in  the  ability  of  these  two  metrics 
to  predict  the  input  and  change  or  total  editor  actions  taken 
by  the  participants. 

No  predictions  about  performance  could  be  made  from 
answers  to  the  questionnaire.  This  inability  probably 
resulted  from  the  small  sample  size  in  this  experiment  (n  *  9) 


Table  4 


Descriptive  Statistics  for  Editor  Actions  by  Run 


Standard  No.  of 


Action 

Mean 

Deviation 

Participant; 

ADDS 

Run  1 

21.2 

7.3 

27 

Run  2 

13.5 

16.2 

4 

Run  3 

2.2 

1.5 

4 

Run  4 

0 

0 

0 

CHANGES 

Run  1 

2.8 

2.8 

21 

Run  2 

2.2 

1.4 

13 

Run  3 

2.2 

1.0 

4 

Run  4 

1.0 

0.0 

2 

DELETES 

Run  1 

3.0 

2.2 

10 

Run  2 

3.0 

0.0 

2 

Run  3 

2.0 

0.0 

1 

Run  4 

1.0 

0.0 

1 

LISTS 

Run  1 

42.2 

47.9 

24 

Run  2 

27.5 

17.9 

13 

Run  3 

33.7 

35.9 

3 

Run  4 

12.5 

13.4 

2 

Table  5 


Correlations  of  Actions  with  Total  Time 
to  Complete  Program  Successfully 


Actions 

For  Run 

1 

For  All 

Runs 

Number  of  actions: 

ADDS 

.71*** 

.46** 

CHANGES 

.71*** 

.73*** 

DELETES 

.70*** 

.68*** 

LISTS 

.74*** 

.74*** 

Total 

.77*** 

.75*** 

Number  of  editor  calls: 

ADDS 

.71*** 

.77*** 

CHANGES 

.68*** 

.67*** 

DELETES 

.64*** 

.57*** 

LISTS 

.78*** 

.73*** 

Total 

.80*** 

.79*** 

Note:  n«  27 

**p  <  .01 

***£  7  .001 
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Table  6 

Correlations  among  Actions  and  Complexity  Metrics 


Halstead' 

s 

Actions 

Volume 

Effort 

ADD 

.63*** 

.68*** 

CHANGE 

.49** 

.39* 

DELETE 

.45** 

.38* 

LIST 

.60*** 

.51** 

Input  and  Change  Actions 
(ADD,  CHANGE,  or  DELETE): 

.68*** 

.68*** 

Total  Actions 
(ADD,  CHANGE, 

DELETE,  or  LIST): 

.64*** 

.57*** 

Notes  n  »  27 
*J)  <.  .05 
**£  <_  .01 
***£  <_  .001 
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DISCUSSION 


In  this  experiment,  Halstead's  volume  and  effort  metrics 
and  McCabe's  cyclomatic  number  were  better  predictors  of 
performance  time  than  was  the  number  of  statements  in  the 
program.  A  previous  experiment  in  this  research  program 
demonstrated  moderate  to  good  relationships  between  debugging 
performance  and  the  Halstead  and  McCabe  metrics  (Curtis, 
Sheppard,  &  Milliman,  1979).  However,  in  that  experiment  the 
metrics  were  computed  on  the  prototype  programs  submitted  to 
the  participants.  Since  the  experimental  task  in  this 
experiment  was  to  construct  a  program,  it  was  possible  to 
compute  the  metrics  on  the  individual's  own  programs.  The 
higher  correlation  of  performance  time  with  the  Halstead  and 
McCabe  metrics  in  this  experiment  indicates  that  these 
metrics  are  better  predictors  of  the  psychological  complexity 
of  a  program  than  is  the  number  of  statements  in  the  program. 
Differences  in  the  prediction  of  performance  between  the 
McCabe  and  Halstead  metrics  have  not  been  consistent  over  the 
experiments  in  this  research  program.  However,  as  we 
improved  our  experimental  techniques  and  reduced  the 
individual  difference  variation,  the  strongest  results  were 
obtained  more  consistently  for  the  Halstead  metrics. 

The  Halstead  metric  for  the  time  required  to  generate  a 
program  underestimated  the  times  required  in  this  experiment. 
This  result  is  not  surprising  since  it  is  doubtful  that 
dividing  Halstead's  effort  metric  by  the  Stroud  number  (18 
mental  discriminations  per  second)  is  related  to  the  actual 
processes  involved  in  constructing  a  program.  The  Stroud 
number  is  related  to  perceptual  discrimination  of  simple 
stimuli  (e.g.,  critical  fusion  frequency  in  integrating 
separate  slides  into  a  motion  picture).  Constructing 
programs  requires  more  complex  cognitive  processing,  and  the 
Stroud  number  is  probably  inappropriate  as  a  measure  of  this 
phenomenon.  Further,  any  measure  of  mental  processing  time 
will  differ  both  among  people  and  in  the  same  person  over 
time,  and  would  have  to  be  recalibrated  for  each  prediction. 
Thus,  Halstead's  time  metric  may  correlate  with  programming 
time,  but  it  is  not  an  absolute  measure  of.  the  phenomena. 

At  the  level  of  the  individual  programmer,  familiarity 
with  relevant  programming  concepts  is  important  for  program 
construction.  Program  1  required  the  matching  of  alphabetic 
strings  and  took  over  twice  as  much  time  as  the  other  two 
programs.  Program  2  required  summing  some  negative  and 
positive  values,  and  Program  3  required  finding  the  maximum 
and  minimum  of  a  set  of  numbers.  The  latter  two  are 
elementary  operations  that  are  performed  often  by 
programmers.  These  algorithms  were  probably  more  familiar  to 
the  nine  participants  than  was  the  matching  of  strings. 
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Similar  results  for  differences  among  algorithms  were  found 
in  other  experiments  in  this  research  program  (Sheppard  et 
al.,  1979). 

A  frequent  error  for  Program  1  involved  comparing  the 
masters  to  the  list  items  instead  of  the  reverse  procedure. 
Some  programmers  did  not  initially  perceive  that  the  two 
methods  of  comparison  produced  different  results.  Thus  they 
had  to  recode  the  algorithm  to  do  the  job  correctly.  The 
specifications  for  this  part  of  Program  1,  "CHECK  FOR 
MASTERS",  were  not  as  complete  and  detailed  as  the 
specifications  presented  for  Programs  2  and  3.  It  is 
therefore  not  surprising  that  a  great  deal  more  variation 
occurred  in  the  performance  times  for  Program  1 .  Informal 
analyses  indicated  that  when  an  algorithm  was  insufficiently 
specified  participants  produced  a  variety  of  solutions,  thus 
introducing  greater  variability  into  the  construction 
process.  In  order  to  reduce  variability  in  programmer 
performance,  clear,  detailed  specifications  for  coding  appear 
to  be  essential. 

No  differences  were  found  among  the  methods  of 
documentation  presented  along  with  the  natural  language 
descriptions  of  the  problems.  Informal  conversations  with 
the  participants  indicated  that  the  additional  documentation 
was  often  ignored.  The  problems  were  simple  enough  that  most 
participants  worked  from  the  English  description  of  the 
task.  It  is  expected  that  in  problems  larger  than  those 
presented  here  the  additional  documentation  would  be  useful. 
Predicting  the  relative  merits  of  the  documentation  formats 
is  not  possible  from  this  experiment. 

A  considerable  learning  effect  took  place  during  the 
pretest.  The  experimenter  participated  actively  during  this 
task,  answering  questions  on  the  use  of  the  microcomputer, 
the  editor,  and  the  Fortran  compiler.  Although  the  pretest 
problem  was  elementary  (compute  the  average  of  a  file  of 
numbers),  the  time  and  number  of  tries  required  to  execute 
this  program  were  greater  than  for  the  experimental  tasks . 
Learning  to  use  the  microcomputer  and  the  editor  during  this 
pretest  seemed  to  be  adequate  to  eliminate  the  effects  of 
learning  during  the  experimental  tasks  themselves. 

Averaged  across  all  conditions,  the  first  submission  of 
the  programs  consumed  86%  of  the  total  time  to  completion 
(18.4  of  the  21.3  minutes).  Thus  the  actions  (e.g.,  ADDS) 
during  the  first  submission  were  excellent  predictors  of  the 
total  time  to  complete  the  program. 

It  is  not  surprising  that  those  programmers  who  entered 
the  largest  number  of  statements  had  the  longest  completion 
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times  for  the  tasks.  It  is  interesting  to  note,  however, 
that  they  also  had  the  largest  number  of  changes,  deletes, 
and  lists.  Apparently  those  programmers  who  select  more 
lengthy  methods  of  solving  problems  increase  their  cognitive 
load  and  need  greater  numbers  of  adjustments  (changes  and 
deletes)  and  greater  help  in  verifying  the  current  state  of 
the  program  listing  (lists). 

The  number  of  calls  to  the  editor  for  a  command  type 
appeared  to  predict  total  time  as  well  as  the  number  of 
statements.  Clearly  this  result  might  be  quite  different 
with  larger,  more  complex  programs.  If  these  data  were  to  be 
validated  for  larger  programs  in  other  experiments,  however, 
it  would  suggest  that  measuring  the  number  of  editor  calls  on 
the  first  submission  or  first  few  submissions  might  provide  a 
project  manager  some  information  concerning  overall 
performance. 

The  practical  implications  of  the  results  presented  here 
are  substantial.  All  major  software  cost  estimation  models 
are  driven  from  an  estimate  of  the  number  of  lines  of  code  in 
the  delivered  product  (Data  Analysis  Center  for  Software, 
1979).  The  data  presented  here,  however,  argue  that  at  the 
modular  level,  lines  of  code  is  a  poor  measure  of  the  time 
required  to  construct  a  program.  Putnam  (1980)  would  suggest 
that  large  variances  in  these  relationships  would  probably 
decrease  substantially  when  data  are  aggregated  at  the  system 
level.  Nevertheless,  in  the  data  presented  here  the  Halstead 
and  McCabe  metrics  were  significantly  better  predictors  of 
performance  time  than  was  the  number  of  statements.  Once 
their  values  can  be  estimated,  these  complexity  metrics  may 
provide  managers  with  better  estimates  of  the  resources  and 
costs  involved  in  later  stages  of  the  development  project. 
Data  from  this  experiment  also  suggest  that  the  validity  of 
these  estimates  may  depend  on  whether  sound  principles  of 
software  engineering  are  observed  in  developing  a  system. 
Projects  on  which  modern  programming  practices  are  enforced 
appear  to  experience  more  predictable  results  than  projects 
without  such  discipline  (Milliman  &  Curtis,  1979).  This 
improved  prediction  results  from  the  partial  elimination  of 
individual  differences  in  performance  achieved  through 
enforcement  of  programming  standards.  Some  refinement 
remains  in  developing  software  complexity  metrics  into  a 
management  information  tool  (Curtis,  1980),  but  their 
potential  is  evident. 
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Instructions  to  Participants 


Hello, 


Today,  we  are  asking  you  to  participate  in  an 
experiment  we  hope  will  be  both  entertaining  and  challenging. 
This  study  is  being  sponsored  by  GE  and  the  Office  of  Naval 
Research  to  examine  the  process  of  writing  computer  programs. 
To  accomplish  this,  we  will  give  you  several  different 
problems  and  ask  you  to  implement  them.  Our  purpose  is  not 
to  evaluate  computer  programmers.  Your  performance  on  a 
program  will  be  compared  only  to  your  performance  on  other 
programs,  and  no  form  of  competition  is  involved.  We  hope 
you  will  assist  us  in  what  we  believe  is  important  research 
in  software  engineering.  However,  your  involvement  is 
voluntary  and  you  are  free  to  withdraw  from  participation  at 
any  time.  All  programs  and  papers  that  you  will  be  handed 
are  carefully  numbered  so  it  is  not  necessary  for  you  to  put 
your  name  on  any  of  these.  These  numbers  are  solely  for  the 
purpose  of  identifying  different  problems  and  cannot  be  used 
to  identify  you  as  an  individual.  Your  work  will  remain 
completely  anonymous  and  data  collected  in  this  study  will  be 
used  for  research  purposes  only. 

For  each  task,  you  will  be  given  a  problem  description 
and  the  format  and  contents  of  the  input  files.  Your  job  is 
to  write  the  program  to  perform  the  requested  function. 

Please  make  an  attempt  to  do  all  you  work  on  the  computer, 
but  if  you  wish  to  write  notes  down,  please  do  so  on  the 
paper  provided  for  each  problem. 

Directions  for  using  the  computer  system  are  found  on 
the  following  pages.  You  may  ask  any  questions  of  the 
experimenter  during  the  session:  however,  some  questions  he 
may  require  you  to  resolve  yourself.  The  first  problem  will 
be  a  short  introductory  problem,  familiarizing  you  with  the 
computer  and  the  types  of  problems  you  will  receive.  Upon 
completion  of  this  problem,  the  computer  will  then  lead  you 
through  the  3  problems  of  the  experiment.  Each  problem  will 
have  an  associated  group  of  papers  with  instructions  and 
documentation.  Please  turn  to  the  appropriate  set  when  told 
to  do  so  by  the  computer.  When  you  have  completed  all  three 
experimental  programs  the  computer  will  present  a 
questionnaire  about  your  past  programming  experience  and  then 
you  are  free  to  leave,  but  please  do  not  discuss  any  of  the 
programs  you  worked  on  with  anyone  else  until  after  we  have 
completed  all  experimental  sessions.  We  request  this  of  you 
only  to  insure  that  our  results  are  valid. 

If  there  are  any  questions,  please  ask  them  at  this 
time.  If  you  are  willing  to  participate  please  sign  your 
name  on  the  line  below  indicating  that  you  have  read  and 
understood  these  instructions. 


Signature 
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APPENDIX  B 


NOTES  ON  USING 
THE  SYSTEM  AND  THE 
FORTRAN  COMPILER 


COMPUTER  SYSTEM  INSTRUCTIONS 


When  you  first  sit  at  the  computer  terminal,  it  will  be 
in  the  command  mode.  The  following  prompt  line  will  appear 
at  the  top  of  the  screens 

ENTER  COMMAND  (E-EDIT,  R-RUN,  Q-QUIT) 

A  typical  scenario  might  be  to  enter  the  editor  (E),  write  a 
program,  exit  the  editor  and  then  run  the  program  (R). 

E:  If  you  wish  to  enter  some  lines  or  make  corrections  to 

your  program,  type  E  and  press  the  return  key.  This  will 
put  you  in  the  editor  portion  of  the  program.  A  detailed 
description  of  the  editor  can  be  found  on  the  next  page. 

R:  If  you  wish  to  submit  your  program  to  compile  and  run, 

type  R  and  press  the  return  key.  Call  the  supervisor, 
who  will  compile  and  run  your  program,  during  which  time 
you  may  take  a  short  break.  If  the  compile  is  not 
successful,  the  supervisor  will  give  you  a  printed 
listing  of  your  compile  errors.  If  the  compile  is 
successful,  you  will  be  given  the  run  output.  The 
supervisor  will  tell  you  whether  the  output  is  correct. 

Q:  The  Quit  command  is  to  be  used  only  when  "all  else 

fails",  and  it  must  be  entered  by  the  supervisor.  If  you 
feel  you  cannot  continue  with  a  specific  program  or  the 
experiment,  tell  the  supervisor  and  he  will  assist  you  in 
terminating  the  session. 
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EDITOR 


Upon  entry  to  the  EDITOR  program  you  will  be  given  the 
following  prompt: 

EDIT  COMMAND:  (R-RENUMBER,  L-LIST,  D-DELETE,  A-ADD, 
C-CHANGE,  F-FINISH) 

The  editor  is  line-number  oriented. 

These  numbers  are  assigned  by  you  when  you  enter  the  A 
(ADD)  command.  The  L  (LOOK)  and  D  (DELETE)  commands 
reference  existing  lines  and  the  R  (RENUMBER)  command 
establishes  a  new  numbering  scheme  for  the  program.  Line 
numbers  are  allowed  in  the  following  format: 

0  to  3  digits  optionally  followed  by  a  period  and 
0  to  3  digits.  At  least  one  character  must  appear. 
Each  number  is  a  fraction  which  can  range  from 
000.001  to  999.999  for  ADD,  RENUMBER,  and  CHANGE, 
and  000.000  to  999.999  for  LISTS  and  DELETES. 

Examples  of  invalid  line  numbers: 


(blank) 

• 

need  at  least  1  digit 

XI 

e 

X  is  not  a  digit  or  a  " . " 

.  .01 

• 

only  one  is  allowed 

.0001 

• 

more  than  3  digits  to  right 

of  decimal 

1000 

• 

more  than  3  digits  to  left 

of  decimal 

NOTE:  If  you  wish,  you  may  enter  the  line  information 

immediately  after  the  command  letter  instead  of 
pressing  return  and  being  prompted. 

R:  To  renumber  the  file,  type  R  followed  by  a  carriage 

return.  You  will  be  prompted  to  specify  the  First  line 
number  and,  optionally,  an  Increment: 

RENUMBER  COMMAND:  FIRST  LI NEC /INCREMENT] 

The  default  Increment  is  1.  Neither  the  First  line 
number  nor  the  Increment  may  be  0.  Neither  number  may 
be  larger  than  999.999 

Example: 

RENUMBER  COMMAND:  FIRST  LINE[ /INCREMENT]  20/10 
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program  before  renumber: 

1 

line 

1 

1.001 

line 

2 

1.002 

line 

3 

103 

line 

4 

program  after  renumber: 

20 

line 

1 

30 

line 

2 

40 

line 

3 

50 

line 

4 

Examples  of  valid  Renumber  commands 

3/4  Numbers  programs  at  3,  7,  11 

. . .  until  end 

900  Numbers  program  at  900,  901 

902,  ...  until  end 

.001/1  Numbers  program  at  .001,  1.001 

. . .  until  end 

Examples  of  invalid  Renumber  commands 


/  3 

• 

must  have  a  starting  line 

0 

e 

cannot  have  line  number  or 

Increment  *  0 

/1/2 

e 

only  1  slash  is  allowed  to 

separate  number  and  Increment 

900/100 

e 

illegal  if  there  is  more  than  1 

line  because  the  second  line  will 
be  greater  than  999.999 

To  list  lines  in  your  program,  type  L.  You  will  be 
prompted  to  state  the  first  line  you  want  to  look  at, 
followed  optionally  by  a  slash  and  the  last  line  you  want 
to  look  at. 


LIST  COMMAND:  FIRST  LINEC/LAST  LINE] 

If  you  only  type  the  first  line  number,  only  that  line  is 
printed.  If  you  enter  both  line  numbers,  the  lines  that 
fall  within  that  range  will  be  printed. 

It  is  not  an  error  if  the  first  or  the  last  line 
number  entered  do  not  have  corresponding  lines  in  the 
program.  If  there  are  not  lines  in  the  specified  range  a 
message  will  be  printed. 

Examples  of  valid  List  commands: 

0/999.999  e  This  lists  all  the  lines  in  the 

program.  0  is  valid  in  look  and 
can  be  used  when  you  don’t  know 
the  first  line  number, 
e  This  lists  only  line  number 


99.03 


1/1 


1 


099.030. 

This  lists  only  line  001. 000 , 


Examples  of  invalid  commands: 


/  3 

0  or  00 


1000 


You  must  have  a  first  number 
There  can  be  no  line  numbered  0 
so  no  line  will  be  found  and  listed. 
This  command  is  not  invalid,  it  is 
just  meaningless. 

There  must  be  a  first  number,  and  if 
there  is  a  slash,  there  must  also  be 
a  second  number. 

Line  number  is  too  large. 


To  Delete  lines  in  your  program,  type  D.  You  will  be 
prompted  to  state  the  first  line  you  wish  to  delete, 
follow  optionally  by  a  slash  and  the  last  line  you  want 
to  delete. 

DELETE  COMMAND:  FIRST  LINE [/LAST  LINE] 

If  you  only  enter  the  First  number,  then  only  that 
line  is  deleted.  If  you  enter  both  line  numbers,  all  the 
lines  that  fall  within  that  range  are  deleted.  It  is  not 
an  error  if  the  First  or  the  Last  line  number  entered  do 
not  have  corresponding  lines  in  the  program.  If  there 
are  no  lines  in  the  specified  range  a  message  will  be 
printed. 

Examples  of  valid  Delete  commands: 


0/999.999 


99.03 


Deletes  all  lines  in  program.  0  is 
valid  for  start  of  range  and  serves 
as  a  useful  tool  to  include  the 
first  line  number. 

Deletes  only  line  099.030  if  it 
exists . 

Deletes  only  line  001.000. 


Examples  of  invalid  commands: 


/3 

0  or  0/0 


1000 


You  must  have  a  first  number. 

There  can  be  no  line  numbered  0 
so  no  line  will  be  found  and 
deleted.  This  command  is  not 
invalid,  it  is  just  meaningless. 

If  there  is  a  slash  there  must  also 
be  a  second  number. 

Number  is  too  large. 


To  Add  lines  to  the  program,  type  A.  You  will  be 
prompted  to  enter  the  First  line "number  you  wish  to  Add, 


followed  optionally  by  a  slash  and  the  Increment. 

ADD  COMMAND:  FIRST  LINE[/ INCREMENT] 

If  you  omit  the  slash  and  Increment  then  the  Increment 
defaults  to  001.000.  The  First  line  number  must  be 
already  exist.  For  instance  if  your  program  appears  as 
follows, 

1  line  1 

2  line  2 

you  may  Add  any  line  but  1  and  2.  This  includes  the 
range  from  .001  to  .999,  1.001  to  1.999,  and  2.001  to 
999.999. 

You  will  be  prompted  with  the  First  line  number.  You 
may  enter  up  to  72  characters.  After  you  enter  a  line 
and  press  Return  it  will  be  included  in  your  program  at 
the  place  its  number  indicates.  You  will  then  be 
prompted  for  the  next  line  number,  which  is  determined 
by  adding  the  Increment  to  the  line  number  of  the  line 
you  just  finished  entering.  If  the  next  line  number  is 
equal  to  or  greater  than  the  number  of  an  already 
existing  line  in  the  program,  the  Add  will  be  terminated. 
All  the  lines  you  have  already  entered  will  remain  in 
the  program  and  will  not  be  affected  by  the  termination. 
When  you  finish  entering  the  lines  you  want,  enter  two 
slashes  (//)  as  the  first  characters  of  a  line  and  press 
return.  This  will  return  you  the  EDIT  COMMAND  level. 

The  following  examples  refer  to  the  example  shown  above: 

1  •  Rejected  because  line  1  already 

exists. 

1.01  •  This  will  allow  one  line  to  be 

added:  then,  since  2.01  (1.01  +  1) 
is  greater  than  or  equal  to  2,  the 
Add  will  be  terminated. 

0  •  Rejected  because  0  can  not  be  a 

line  number. 

5  •  You  may  enter  lines  from  5,  6,  7, 

...  999. 

/.5  •  Rejected  because  there  is  no  line 

number. 

1.4/.1  •  This  will  allow  you  to  add  up  to  6 

lines  1.4,  1.5,  1.6,  1.7,  1.8,  1.9 
before  it  terminates  because  it 
passes  or  equals  2. 

5/  •  Rejected  because  if  there  is  a  slash 

an  increment  must  follow. 


Sample  session: 


ADD  COMMAND:  FIRST  LINE [/INCREMENT]  1.3/. 2 


1 . 3  you  type  something  here 
1.5  you  also  type  something  here 
1.7  // 

The  program  would  now  look  like  this: 

1  line  1 

1 . 3  you  type  something  here 
1.5  you  also  type  something  here 

2  line  2 

C:  To  change  an  existing  line,  type  C.  You  will  be  prompted 
to  enter  the  line  number  of  the  line  you  wish  to  change: 

CHANGE  COMMAND:  LINE  NUMBER 

The  system  will  then  type  the  line  as  it  currently 
is.  Enter  the  line  you  want,  then  press  return.  If  you 
decide  you  really  didn't  want  to  make  a  change,  enter  two 
slashes  (//)  followed  by  a  return.  The  system  will  leave 
the  line  as  it  was. 


Given  the  sample  program; 

1  LINE  1 

2  LINE  2 


CHANGE  COMMAND:  LINE  NUMBER  1  would  perform  the 
following 

1  LINE  1 

1 


You  enter  the  new  line  contents  and  press  return. 
(E.G.  NEW  LINE).  The  program  would  now  look  like  this: 

1  NEW  LINE 

2  LINE  2 

If  you  had  entered  //  the  program  would  not  have  been 
changed . 

The  following  examples  refer  to  the  sample  program  given 
above: 


2 

3 

1/2 


•  Allows  you  to  change  line  002.000. 

•  Rejected  because  there  is  no  line 
003.000. 

•  Rejected  because  only  one  line  is 
allowed  to  be  Changed  at  a  time 
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0 


•  No  line  0  exists  or  is  allowed  to 
exist. 

Fs  When  you  wish  to  return  to  the  main  command  mode  (Usually 
to  submit  your  program  for  a  run)  enter  F  followed  by 
return. 

Notes  on  typing  on  the  computer: 

The  following  rules  apply  to  your  interaction  with 
the  computer  before  you  press  the  return  key. 

Erase  characters  -  if  you  wish  to  erase  the  last 

character  you  typed,  press  the 
DELETE  key.  If  you  wish  to  erase 
more,  keep  pressing  the  delete  key. 

Erase  a  line  -  If  you  wish  to  start  the  present 

line  over,  press  the  LINE  DEL  key 
and  it  will  erase  all  characters  you 
have  just  typed.  You  may  then  re¬ 
enter  your  command  or  program  line. 

After  you  have  pressed  return  you  will  have  to  live  with 
the  consequence  of  your  actions. 

DO  NOT  USE  TAB  characters.  This  program  is  not  set  up  to 
handle  them. 
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FORTRAN  NOTES 


The  FORTRAN  you  will  be  using  is  STANDARD  FORTRAN  IV 
with  the  few  differences  noted  below: 

1 .  Opening  files  for  input  should  be  done  in  the  following 
manner : 

CALL  OPEN  (6,  'filename',  0) 

The  file  name  must  be  followed  by  four  blanks  and  must  be 
in  single  quotes.  If  the  input  file  is  ONRDOOl ,  the 
statement  would  be: 

CALL  OPEN  (6,  ' ONRDOOlbbbb '  0) 

It  is  not  necessary  to  close  the  file. 

2.  All  read  statements  use  6  as  the  device  unit  number; 
e.g. , 

READ  (6,  100)  A,  B,  C 
100  FORMAT  (3F10.3) 

3.  All  output  statements  use  device  number  2;  e.g., 

WRITE  (2,300)  I,  A 

300  FORMAT  (10X,  'I-',  15,  ’A-’,  F10.3) 

4.  Only  variables  are  allowed  in  WRITE  lists.  No 
expressions  or  constants  are  allowed 

acceptable: 

WRITE  (2,  2)  A,  B,  C 

WRITE  (2,  400)  ( A( I ) ,  I  -  1,  10) 

WRITE  (2,  346) 

not  acceptable:  REASON 

WRITE  (2,2)  A,  B,  3  3  is  a  constant 

WRITE  (2,  400)  A+l  A+l  is  an  expression 

WRITE  (2,  195)  I-J,  B,  D  I-J  is  an  expression 

5.  End  of  file  or  ERROR  branches  may  be  specified  in  READ  or 

WRITE  statements  using  the  ERR"  and  END*  options;  e.g., 

10  READ  (6,  2,  END*  100,  ERR-200)  A,  B,  C 
2  FORMAT  (3F10.2) 
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100 

GO  TO  10 
WRITE  (2, 

3) 

3 

FORMAT  ( ' 

END  OF 

FILE'  ) 

200 

GO  TO  999 
WRITE  (2, 

4) 

4 

FORMAT  ( ' 

ERROR 

IN  READ1 ) 

999 

STOP 

END 

NO  COMPLEX  Data  Types  are  allowed. 

DO  loops  will  always  be  executed  at  least  once. 

Hollerith  constants  can  be  represented  by  the  NHString 

format  or  enclosed  in  single  quotes. 

Examples: 

A  =*  'A'  or  1HA  A  has  been  previously  defined 

to  be  type  logical  by  the 
programmer 

I  a*  'CE'  or  2HCE  I  has  been  previously  defined 

to  be  type  Integer  by  the 
programmer 

R  *  1 JXYZ '  or  4HJXYZ  R  has  been  previously  defined 

to  be  type  real  by  the 
programmer 


If  fewer  characters  than  the  allotted  space  are  given, 
the  Hollerith  values  are  left  justified  with  trailing 
blanks . 

Mixed  Mode  expressions  and  assignments  are  allowed,  and 
conversions  are  done  automatically. 

The  specification  statements  must  appear  in  the  following 
order : 

A.  PROGRAM  or  SUBROUTINE  or  FUNCTION  or  BLOCK  DATA 

B.  Type  or  EXTERNAL  or  DIMENSION 

C .  COMMON 

D.  EQUIVALENCE 

E.  DATA 

F.  Statement  Functions 
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***WE  WOULD  LIKE  YOU  TO  ANSWER  THE  FOLLOWING  QUESTIONS  FOR*** 


***OUR  RESEARCH  PURPOSES:  *** 

***  *** 

***PLEASE  ANSWER  EACH  QUESTION  ON  A  SINGLE  LINE  *** 

*** AFTER  TYPING  THE  ANSWER  TO  A  QUESTION,  PRESS  RETURN.  *** 

*** 

***IF  THE  ANSWER  TO  A  QUESTION  IS  YES  OR  NO,  YOU  MAY  *** 

***  ENTER  Y  OR  N.  *** 

***IF  YOU  CAN  NOT  ANSWER  A  QUESTION  ENTER  -  DON'T  KNOW  *** 
***IF  THE  QUESTION  IS  NOT  APPLICABLE  ENTER  -  N/A  *** 

*** 


HOW  MANY  YEARS  HAVE  YOU  BEEN  PROGRAMMING  PROFESSIONALLY? 

HOW  MANY  YEARS  HAVE  YOU  BEEN  PROGRAMMING  FORTRAN  PROFESSIONALLY? 
HAS  YOUR  EXPERIENCE  BEEN  PRIMARILY  WITH 

A.  ENGINEERING 

B.  STATISTICAL 

C.  NON-NUMERICAL 

D.  BUSINESS 

E.  DATA  BASE 

F.  OTHER (PLEASE  SPECIFY 

HOW  MANY  LINES  WERE  IN  YOUR  LONGEST  FORTRAN  PROGRAM? 

HOW  MANY  LINES  WERE  IN  YOUR  LONGEST  NON-FORTRAN  PROGRAM? 

IN  WHAT  LANGUAGE  WAS  YOUR  LONGEST  NON-FORTRAN  PROGRAM? 

HAVE  YOU  USED: 

FORTRAN? 

FORTRAN  77? 

COBOL? 

PL/1? 

BASIC? 

PASCAL? 

APL? 

ALGOL? 

JOVIAL? 

ASSEMBLER? 

RPG? 

SNOBOL? 

LISP? 

OTHER (GIVE  NAME)? 

WHAT  WAS  THE  FIRST  LANGUAGE  YOU  LEARNED? 

IN  YOUR  PROGRAMMING  HAVE  YOU  USED: 

DO  statement? 
arrays? 

CALL  with  parameters? 

COMMON? 

READ  statement? 

PRINT  statement? 

WRITE  statement? 

FORMAT  statement? 

'X'  format  specification? 

'A'  format  specification? 

'I'  format  specification? 

'F'  format  specification? 
continuation  lines? 

'H'  format  specification? 
implicit  data  types? 

IF  THEN  ELSE  (concept)? 

CREDITS  in  monetary  transactions? 

DEBITS  in  monetary  transactions? 

Financial  transactions? 

TRIAL  BALANCE  computation? 
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GENERAL  LEDGER  accounting? 

REAL  notation(0 . 01 ) ? 

tax  computation? 

carriage  control  Hollerith? 

2  dimension  or  more  arrays? 

using  quotes  to  delimit  strings  in  output  formats? 

IMPLICIT  statement? 
heap  sorts? 
stacks? 
tree  search? 

NAMELIST  statement? 

'T*  format  specification? 
interrupt  handlers? 
parsers? 

lexical  analyzers? 

graphics  drivers  and  handlers? 

DATA  statement? 

conversion  from  alpha  to  string  variables? 

IF  of  more  than  1  condition? 
decimal  to  integer  conversion? 
percentile  computation? 

DO  WHILE  concept? 

DO  UNTIL  concept? 
weighting  numbers  or  scores? 

rounding  numbers  when  don’t  have  rounding  function? 

used  an  array  reference  as  an  index  to  another  array? 

finding  maximum  value  in  an  array? 

finding  mean  of  values  in  an  array? 

printing  titles  on  output? 

computing  frequencies  of  items? 

running  sums? 

bubble  sort? 

implied  DO? 

equivalenced  arrays? 

string  variables? 

the  binary  equivalent  of  characters? 
interactive  debugger? 
symbolic  debugger? 

TRACE  mechanism? 

Octal  or  Hex  dumps? 
double  precision? 
free  field  I/O? 
matrix  inversion? 
pattern  matching? 
device  drivers? 
batch  systems? 
interactive  systems? 
list  handling  languages? 

*** 

On  the  back  of  your  instruction  sheet  we  would  like  you  to  indicate 
any  other  particulars  which  you  feel  may  have  an  effect  on  your 
performance.  For  instance,  we  would  like  to  know  if  most  of  your 
work  is  in  debugging  systems  or  in  design.  Also,  please  indicate 
your  reactions  to  the  experiment  and  anything  that  you  feel  might 
help  us  to  improve  it.  If  you  pursued  the  task  in  any  special  way 
we  would  like  to  know. 

**« 
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SOFTWARE  EXPERIMENTATION  LABORATORY 


Experiments  in  software  development  depend  very  heavily 
upon  the  methods  and  tools  used.  Research  has  typically 
involved  either  field  studies  on  large  programs  or 
experiments  on  small  artifical  problems.  The  experimental 
process  requires  strict  controls  over  system  response  for 
accurate  timing  and  consistent  presentation  of  materials. 

All  interactions  of  the  subject  with  the  computer  must  be 
recorded  so  that  his  actions  can  be  reconstructed  at  a 
later  time.  For  ease  of  debugging  and  modification  the 
experimental  system  should  be  implemented  using  a  high  level 
language. 

The  Software  Management  Research  group  has  established  a 
software  experimentation  laboratory  around  a  Northwest 
Microcomputer  Systems  NMS  85/P.  This  microcomputer  uses  both 
the  UCSD  PASCAL  operating  system  and  the  CPM  operating 
system  which  supports  a  Fortran  compiler.  The  system  has  a 
CRT  for  display,  a  printer  to  allow  hard  copy  generation  and 
an  interval  timer  to  provide  timing  accurate  to  .01  seconds. 


The  PASCAL  language  was  chosen  for  implementing  the 
experimentation  software  for  several  reasons.  It  is  a  simple 
language  with  a  unified  concept,  easy  to  learn,  yet  powerful. 
PASCAL  allows  the  building  of  data  structures,  such  as 
strings,  records,  and  pointers,  allowing  the  implementer  to 
model  a  problem  in  a  manner  close  to  the  real-world 
representation.  These  factors  are  expected  to  contribute  to 
the  ease  of  implementing,  maintaining,  and  modifying  the 
experimental  software. 

The  experimental  system  established  for  the 
microprocessor  laboratory  is  designed  to  take  the  user  step 
by  step  through  the  experiment.  It  provides  the  environment 
for  a  pretest  and  the  three  experimental  conditions  and 
gives  the  user  an  online  questionnaire. 

The  editor  is  designed  to  be  easy  to  learn  and  use. 

Users  see  where  they  are  and  know  which  lines  they  are 
working  with  at  any  time.  A  structure  is  set  up  for  using  a 
random  access  file  for  edit  lines.  The  edit  lines  are 
organized  around  a  linked  list  structure,  and  the  user 
references  them  by  line  numbers.  The  disadvantages  of  this 
approach  are  slow  speed  and  heavy  I/O  usage,  but  in  a  single 
user  environment  this  is  not  felt  to  be  a  problem.  The 
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advantages  are  that  any  changes  made  are  immediately  updated 
to  the  edit  file.  (Since  microprocessors  do  not  tend  to  be 
as  highly  reliable  as  minicomputers  and  mainframes  it  was 
necessary  to  be  ready  for  the  unexpected) .  Another  positive 
aspect  is  that  the  subject  does  not  have  to  be  concerned  with 
buffering  problems.  As  far  as  he  knows,  his  file  is 
essentially  infinite;  he  does  not  have  the  difficulty  of 
asking  for  previous  or  forward  buffers. 

The  editor  allows  five  commands  to  operate  on  a  file  - 
ADD,  DELETE,  LIST,  RENUMBER,  and  CHANGE.  There  is  only  one 
file  for  each  problem,  so  the  computer  does  the  file 
handling.  Each  line  in  the  edit  file  is  assigned  a  number 
by  the  user,  which  is  referenced  whenever  an  edit  operation 
is  performed.  The  ADD  command  allows  the  user  to  add  1  or 
more  lines  to  any  location  not  already  existing  as  a  line 
number.  The  DELETE  and  LIST  commands  allow  a  range  of  lines 
to  be  deleted  and  listed.  The  RENUMBER  command  allows  the 
user  to  re-assign  line  numbers  to  the  file  based  upon  a 
starting  line  number  and  an  increment.  The  CHANGE  command 
allows  the  user  to  reenter  a  given  line.  There  are  no  search 
or  substitute  commands.  It  was  felt  that  these  would  add  to 
the  complexity  of  the  editor  and  would  not  provide  great 
utility  given  the  small  size  of  the  programs  to  be 
constructed . 

When  users  are  ready  to  submit  a  job  for  compilation  and 
execution  they  exit  from  the  editor  and  request  that  their 
program  be  run.  This  is  a  one  letter  command,  and  the 
microcomputer  handles  the  next  steps.  At  present,  the 
Fortran  compile  is  performed  under  a  different  operating 
system  on  the  same  computer.  The  system  is  turned  over  to 
the  experimenter  who  transfers  the  edit  file  to  the  CPM 
system  to  do  a  Fortran  compile.  If  the  compile  is 
successful,  the  experimenter  then  runs  the  program.  In 
either  case,  the  experimenter  transfers  the  results  back  to 
the  PASCAL  system  and  resumes  the  experimental  program.  If 
the  experimenter  indicates  that  the  run  is  successful,  the 
experimental  control  program  proceeds  to  the  next  condition. 
If  there  is  a  compile  error  or  run  error,  the  program  stays 
in  the  same  edit  file  and  allows  the  user  to  resolve  the 
problem. 

After  completion  of  the  pretest  and  the  three 
experimental  problems,  the  users  are  given  a  questionnaire 
on-line  about  their  experience  in  software  development 
(mostly  programming).  The  results  are  stored  on  a  file  for 
future  analysis. 

This  system  opens  the  door  to  a  wide  variety  of  research 
efforts,  allowing  a  greater  degree  of  objectivity  and 
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accuracy  than  is  normally  available  for  research  in  this 
field.  The  present  experimental  program  will  need  only  minor 
changes  to  provide  the  format  of  experiments  on  debugging 
and  modification.  The  program  will  be  modified  to  display  an 
existing  file  rather  than  one  that  is  created  by  the  user. 

It  is  simple  to  add  a  FIND  command  to  allow  the  user  to 
search  for  relevant  strings  in  the  file. 

With  minor  changes  it  is  possible  to  perform 
psychological  experimentation,  reaction  time  studies,  or  even 
automated  questionnaire  presentation.  Other  enhancements 
would  be  the  interfacing  of  this  system  with  other  systems. 
The  NMS  85/P  has  an  RC  232C  interface  so  all  that  is  required 
is  to  change  the  current  submission  module  to  handle  the 
protocol  of  a  different  system.  This  would  allow  a 
programmer’s  workbench  arrangement  for  the  user,  allowing  an 
audit  trail  to  be  obtained  from  performance  with  different 
languages  while  presenting  a  uniform  environment  to  the  user. 
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